巴西专利BR112019019163A2 sphere equator projection for efficient 360-degree video compression

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
systems and methods for processing 360 degree video data are provided. in various implementations, a spherical representation of a 360-degree video frame can be segmented into an upper region, a lower region, and an intermediate region. using a cylindrical equal area projection, such as lambert's cylindrical equal area projection, the middle region can be mapped into one or more rectangular areas of an output video frame.
公开号:BR112019019163A2
申请号:R112019019163
申请日:2018-03-21
公开日:2020-04-14
发明作者:Van Der Auwera Geert；Karczewicz Marta；Coban Muhammed
申请人:Qualcomm Inc；
IPC主号:

专利说明:

SPHERE ECUADOR PROJECTION FOR EFFICIENT 360 DEGREES VIDEO COMPRESSION
FUNDAMENTALS [0001] Virtual reality (VR) describes a three-dimensional computer-generated environment that can be interacted in an apparently real or physical way. Generally, a user experiencing a virtual reality environment can turn left or right, look up or down and / or move back and forth, thus changing their view of the virtual environment. The 360-degree video presented to the user can change accordingly, so that the user experience is as seamless as in the real world. Virtual reality video can be captured and rendered with very high quality, potentially providing a truly immersive virtual reality experience.
[0002] To provide a perfect 360 degree view, the video captured by a 360 degree video capture system usually passes through image stitching. Joining images in the case of 360-degree video generation involves combining or joining video frames from adjacent cameras in the area where the video frames overlap or otherwise connect. The result would be an approximately spherical picture. Similar to a Mercator projection, however, the joined data is typically represented in a flat form. For example, pixels in a joined video frame can be mapped to the planes of a cube shape, or some other three-dimensional plane shape (for example, a pyramid, a
Petition 870190092083, of 16/09/2019, p. 6/100
2/71 octahedron, one decahedron, etc.). Video capture and video display devices generally operate on a scanning principle, which means that a video frame is treated as a grid of pixels, so square or rectangular planes are normally used to represent a spherical environment.
[0003] The 360 degree video can be encoded for storage and / or transmission. Video coding standards include the International Telecommunication Union (ITU) ITU-T H.261, International Standards Organization / International Electronics Commission (ISO / IEC), MPEG-1 Visual Group, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG-4 Visual, ITU-T H.264 (also known as ISO / IEC MPEG-4 AVO), including its scalable video encoding (SVC ) and Multivista Video Coding (MVC) and ITU-T H.265 extensions (also known as ISO / IEC MPEG-4 HEVC) with their extensions.
BRIEF SUMMARY [0004] In various implementations, techniques and systems are described to process 360 degree video data for better coding efficiency. These techniques and systems may include using a segmented sphere projection to divide a spherical representation of a 360-degree video frame into a north or upper pole, a south pole or lower region, and an intermediate or equatorial region. The regions can then be mapped to a two-dimensional, rectangular shape that may be easier for
Petition 870190092083, of 16/09/2019, p. 7/100
3/71 coding to manipulate. When generating this mapping, a projection of equal cylindrical area can be used to map the equatorial region in two-dimensional format. The cylindrical equal area formats modify the aspect ratio of the equatorial region in order to preserve the area. Preserving the area can result in less distortion that would be detrimental to achieve better coding efficiency.
[0005] According to at least one example, a method for encoding video data is provided. In various implementations, the method includes obtaining 360 degree video data including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame. The method also includes segmenting a video frame from the plurality of video frames in an upper region, an intermediate region, and a lower region, the upper region including a first circular area of the spherical representation, the lower region including a second area circular of the spherical representation that is opposite in the spherical representation from the first circular area, in which the intermediate region includes an area of the spherical representation not included in the upper or lower region. The method also includes mapping, using a projection of equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame.
[0006] In another example, a device is provided that includes a memory configured for
Petition 870190092083, of 16/09/2019, p. 8/100
4/71 store 360 degree video data and a processor. The 360 degree video data can include a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame. The processor is configured to and can segment a video frame from the plurality of video frames in an upper region, an intermediate region, and a lower region, the upper region including a first circular area of the spherical representation, the lower region including a second circular area of the spherical representation that is opposite in the spherical representation from the first circular area, wherein the intermediate region includes an area of the spherical representation not included in the upper or lower region. The processor is configured to and can map, using a projection of equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame.
[0007] In another example, a non-transitory, computer-readable medium is provided having stored instructions on it that, when executed by one or more processors, cause the one or more processors to perform operations including obtaining 360 degree video data including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame. Instructions can also cause one or more processors to perform operations including segmenting a video frame from
Petition 870190092083, of 16/09/2019, p. 9/100
5/71 of the plurality of video frames in an upper region, an intermediate region, and a lower region, the upper region including a first circular area of the spherical representation, the lower region including a second circular area of the spherical representation that is opposite in the spherical representation from the first circular area, where the intermediate region includes an area of the spherical representation not included in the upper or lower region. Instructions can also cause one or more processors to perform operations including mapping, using an area projection
equal cylindrical, The intermediate region The one or more areas rectangular in a video frame of output. [0008] In another example, one device is provided that includes means to obtain data of video in
360 degrees including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame. The apparatus further comprises means for segmenting a video frame from the plurality of video frames in an upper region, an intermediate region, and a lower region, the upper region including a first circular area of the spherical representation, the lower region including a second circular area of the spherical representation that is opposite in the spherical representation from the first circular area, wherein the intermediate region includes an area of the spherical representation not included in the upper or lower region. The device also includes means to map the intermediate region to one or more areas
Petition 870190092083, of 16/09/2019, p. 10/100
6/71 rectangular of an output video frame. The apparatus further comprises means for means for mapping, using a projection of an equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame.
[0009] In some respects, the video frame is segmented at a first latitude above an equator of the spherical representation and a second latitude below the equator, where the first latitude and the second latitude are equidistant from the equator, where the upper region is above the first latitude, and the lower region is below the second latitude. In some respects, the middle region includes two thirds of the spherical representation area.
[0010] In some respects, mapping the middle region includes selecting a pixel location in the output video frame, and determining a point in the spherical representation corresponding to the pixel location, where the point in the spherical representation is determined using a mapping for convert a two-dimensional rectangle into a three-dimensional sphere. These aspects also include sampling a pixel at the point in the spherical representation, and assigning the sampled pixel to the pixel location.
[0011] In some respects, the middle region includes a left view, a front view, and a right view, where the left view is placed on the output video frame adjacent to the front view, and where the right view is placed adjacent to the front view.
Petition 870190092083, of 16/09/2019, p. 11/100
7/71 [0012] In some aspects, the middle region includes a rear view, in which the lower region is placed in the output video frame adjacent to the rear view, and in which the upper region is placed adjacent to the rear view.
[0013] In some respects, the methods, computer readable medium, and apparatus described above may also include mapping the upper region on the output video frame and mapping the lower region on the output video frame
[0014] In some aspects, O frame of video in exit has a reason aspect three per two. [0015] In according to at one less example, one method for encode video data is provided. In
In various implementations, the method includes obtaining 360 degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame. The method also includes identifying one or more rectangular areas of a video frame from the plurality of video frames. The method also includes mapping, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, in which the region intermediate is located between the upper and lower regions.
[0016] In another example, a device is
Petition 870190092083, of 16/09/2019, p. 12/100
8/71 provided that includes a memory configured to store 360 degree video data and a processor. 360 degree video data can include a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame. The processor is configured for and can identify a or more rectangular areas of a video frame from the plurality of video frames. The processor is configured to and can map, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, in that the intermediate region is located between the upper region and the lower region.
[0017] In another example, a non-transitory computer-readable medium is provided having instructions stored on it that, when executed by one or more processors, cause the one or more processors to perform operations including obtaining 360 degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame. The instructions can also cause one or more processors to perform operations including identifying one or more rectangular areas of a video frame from the plurality of video frames. The instructions
Petition 870190092083, of 16/09/2019, p. 13/100
9/71 can further cause one or more processors to perform operations including mapping, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, where the intermediate region is located between the upper region and the lower region.
[0018]
In another example, a device provided that includes means for obtaining 360 degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the frame of video. The apparatus further comprises means for identifying one or more rectangular areas of a video frame from the plurality of video frames. The apparatus further comprises means for mapping, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, in which the middle region is located between the upper region and the lower region.
[0019]
In some respects, the upper region includes a surface of the spherical representation above a first latitude of the spherical representation, where the lower region includes a surface of the spherical representation below a second latitude of the representation
Petition 870190092083, of 16/09/2019, p. 14/100
10/71 spherical, where the first latitude and the second latitude are equidistant from an equator of the spherical representation. In some respects, the one or more rectangular areas include two thirds of an area of the video frame.
[0020] In some ways, mapping to one or more rectangular areas includes selecting a point in spherical representation, and determining a pixel location in the video frame that corresponds to the point, where the pixel location is determined using a mapping to convert a three-dimensional sphere for two-dimensional rectangle. These aspects also include sampling a pixel from the pixel location, and assigning the sampled pixel to the point.
[0021] In some respects, one or more additional rectangular areas include a left view, a front view and a right view, where the left view is located adjacent to the front view, and where the right view is adjacent to the front view .
[0022] In some respects, the one or more additional rectangular areas include a rear view, in which the first rectangular area is adjacent to the rear view, and in which the second rectangular area is adjacent to the rear view.
[0023] In some ways, the methods, computer readable, and apparatus discussed above still include mapping a first rectangular area of the video frame in the upper region and mapping a second rectangular area of the video frame in the lower region.
[0024] This summary is not intended to identify
Petition 870190092083, of 16/09/2019, p. 15/100
11/71 key or essential characteristics of the claimed object, nor is it intended to be used in isolation to determine the scope of the claimed object. The subject is to be understood by reference to the appropriate portions of the entire specification of this patent, any or all of the drawings, and each claim.
[0025] The precedent, together with other characteristics and modalities, will become more apparent when referring to the following specification, claims and attached drawings. BRIEF DESCRIPTION OF THE DRAWINGS [0026] The illustrative embodiments of the present invention are described in detail below with reference to the following figures:
[0027] Figure IA illustrates a video frame that includes an equirectangular projection of a 360 degree video frame.
[0028] Figure 1B illustrates a video frame that includes a cube mapping projection of a 360 degree video frame.
[0029] Figure 2A is a diagram illustrating the projection of segmented sphere from the surface of a sphere for vertical mapping.
[0030] Figure 2B is a diagram that illustrates an alternative mapping for faces or views that can be generated using segmented sphere mapping.
[0031] Figure 3 is a diagram that illustrates an application example of the Lambert cylindrical equal area projection for the equatorial segment of a
Petition 870190092083, of 16/09/2019, p. 16/100
12/71 sphere.
[0032] Figure 4 is a diagram that illustrates an example of mapping a circle to a square or a square to a circle.
[0033] Figure 5 is a diagram that illustrates an example of mapping a circle to a square and a square to a circle.
[0034] THE Figure 6 illustrates a example of one picture of video what was mapped from data in video of 360 degrees using a projection of equal area cylindrical for the equatorial region and a mapping in
circle to square for the polar regions.
[0035] Figure 7 is a flowchart that illustrates an example of a process for processing video data according to the techniques discussed here.
[0036] Figure 8 is a flowchart that illustrates an example of a process for processing video data according to the techniques discussed here.
[0037] Figure 9 is a block diagram illustrating an example of a coding device.
[0038] Figure 10 is a block diagram illustrating an example of a decoding device.
DETAILED DESCRIPTION [0039] Certain aspects and modalities of this disclosure are provided below. Some of these aspects and modalities can be applied independently and some of them can be applied in combination as the person of skill in the technique would be evident. In the following description, for the purpose of explanation, specific details are established to provide an understanding
Petition 870190092083, of 16/09/2019, p. 17/100
13/71 complete of the modalities of the invention. However, it will be evident that several modalities can be practiced without these specific details. Figures and description are not intended to be restrictive.
[0040] The following description provides examples only and is not intended to limit the scope, applicability or configuration of the disclosure. On the contrary, the following description of several examples will provide the person skilled in the art with a facilitating description for the implementation of any of the examples. It should be understood that various changes in the function and arrangement of the elements can be made without departing from the spirit and scope of the invention, as set out in the attached claims.
[0041] Specific details are provided in the description below to provide a complete understanding of the examples. However, it will be understood by a person skilled in the art that examples can be practiced without these specific details. For example, circuits, systems, networks, processes and other components can be shown as components in the form of a block diagram so as not to obscure the examples in unnecessary details. In other cases, well-known circuits, processes, algorithms, structures and techniques can be shown without unnecessary details to avoid obscuring the examples.
[0042] Furthermore, it is noted that the individual examples can be described as a process that is represented as a flow chart, a flow diagram, a data flow diagram, a structure diagram or
Petition 870190092083, of 16/09/2019, p. 18/100
14/71 a block diagram. Although a flowchart can describe operations as a sequential process, many of the operations can be performed in parallel or simultaneously. In addition, the order of operations can be rearranged. A process is completed when its operations are completed, but it may have additional steps not included in a figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination may correspond to a return of the function to the calling function or the main function.
[0043] The term computer-readable medium includes, but is not limited to, portable or non-portable storage devices, optical storage devices and various other media capable of storing, containing or transporting instruction (s) and / or data. A computer-readable medium may include a non-transitory medium in which data can be stored and which does not include carrier waves and / or transient electronic signals that propagate wirelessly or through wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape, optical storage media such as a compact disk (CD) or digital versatile disk (DVD), flash memory, memory or memory devices. A computer-readable medium may have stored in the machine executable code and / or instructions that can represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a
Petition 870190092083, of 16/09/2019, p. 19/100
15/71 software package, a class or any combination of instructions, data structures or program statements. A code segment can be coupled to another code segment or to a hardware circuit passing and / or receiving information, data, arguments, parameters or memory content. Information, arguments, parameters, data, etc. they can be passed, forwarded or transmitted by any suitable means, including memory sharing, message passing, token passing, network transmission or the like.
[0044] In addition, several examples can be implemented by hardware, software, firmware, middleware, microcode, hardware description languages or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or segments of code to perform the necessary tasks (for example, a computer program product) can be stored in a computer-readable or machine-readable medium. A processor (s) can perform the necessary tasks.
[0045] Virtual reality (VR) describes a three-dimensional computer-generated environment with which one can interact in an apparently real or physical way. In some cases, a user who experiences a virtual reality environment uses electronic equipment, such as a head-mounted monitor (HMD) and, optionally, other items that can be used, such as gloves equipped with sensors, to interact with the virtual environment. As the user moves in the real world, images
Petition 870190092083, of 16/09/2019, p. 20/100
16/71 rendered in the virtual environment also change, giving the user the perception that the user is moving within the virtual environment. In some cases, the virtual environment includes the sound that correlates with the user's movements, giving the user the impression that the sounds originate from a certain direction or source. The virtual reality video can be captured and rendered with very high quality, potentially providing a truly immersive virtual reality experience. Virtual reality apps include games, training, education, videos
sports and online shopping, among others. [0046] 0 360 degree video is a video captured for viewing in an environment of reality virtual. For example, a 360-degree video frame can include a total of 360 degrees visible from a certain point, such that the frame includes pixels for all or part of a sphere centered on the point. The
360-degree video data can also be termed as spherical video, because 360-degree video captures a view in all directions, such that each 360-degree video frame can be viewed as a sphere of captured pixels. A 360-degree video frame can be computer generated, and can be used to present fictional environments. In some applications, video from the real world can be used to present a virtual reality environment. In these applications, a user can experience another location in the same way that the user can experience a present location. For example, a user can experience
Petition 870190092083, of 16/09/2019, p. 21/100
17/71 a walking tour of Berlin while using a 360 degree video system in San Francisco.
[0047] A 360 degree video system may include a video capture device and a video display device, and possibly also other intermediate devices such as servers, data storage, and data transmission equipment. A video capture device can include a camera set, that is, a set of multiple cameras, each oriented in a different direction and capturing a different view. In various applications, two to six cameras can be used to capture a full 360 degree centralized view at the camera assembly location. Some video capture devices may use fewer cameras, such as, for example, video capture devices that mainly capture side-by-side views. A video includes frames, where a frame is an electronically encoded still image of a scene. The cameras capture a certain number of frames per second, which is called the camera's frame rate.
[0048] In some cases, to provide a perfect 360 degree view, the video captured by each of the cameras in the camera set undergoes image junction. Image merging in the case of 360-degree video generation involves combining or merging video frames from adjacent cameras in the area where the video frames overlap or otherwise connect. The result is an approximately spherical frame of video data. To integrate with existing video systems, the board
Petition 870190092083, of 16/09/2019, p. 22/100
Spherical 18/71 video data can be mapped to a flat format. To map techniques such as those used to generate Mercator projections, they can be used to produce an equirectangular shape. As another example, the pixels in a joined video frame can be mapped to the planes of a cube shape, or some other three-dimensional flat shape (for example, a pyramid, an octahedron, a decahedron, etc.). Video capture and video display devices operate on a scanning principle, which means that a video frame is treated as a grid of pixels, so square or rectangular planes are normally used to represent a spherical environment.
[0049] 360-degree video frames, mapped to a flat representation, can be encoded and / or compressed for storage and / or transmission. Encoding and / or compression can be performed using a video codec (for example, a codec compatible with the High Efficiency Video Coding (HEVC) standard, which is also known as H.265, or a codec compatible with the advanced video encoding standard, which is also known as another suitable encoding standard), which results in an encoded and / or compressed video bit stream or group of bit streams. The encoding of video data using a video codec is described in more detail below.
[0050] In some implementations, o (s) bit stream (s) encoded video (s) can (m) i be stored (s) and / or encapsulated (s) in a format of media or format of archive. 0 (s) i flow (s) of bits
Petition 870190092083, of 16/09/2019, p. 23/100
19/71 stored (s) can (s) be transmitted, for example, over a network, to a receiving device that can decode and render the video for display. Such a receiving device can be referred to here as a video display device. For example, a 360-degree video system can generate encapsulated files from encoded video data (for example, using a base media file format and / or file formats derived from the International Standards Organization (ISO)) . For example, the video codec can encode the video data and an encapsulation mechanism can generate the media files by encapsulating the video data in one or more ISO format media files. Alternatively or in addition, the stored bit stream (s) can be provided directly from a storage medium to a receiving device.
[0051] A receiving device can also implement a codec to decode and / or decompress an encoded video bit stream. In cases where the encoded video bit stream (s) are stored and / or encapsulated in a media format or file format, the receiving device can support the file or media format that was used to compress the video bit stream and can extract the video (and possibly also audio) data to generate the encoded video data. For example, the receiving device can analyze the media files with the encapsulated video data to generate the encoded video data, and the codec on the receiving device can
Petition 870190092083, of 16/09/2019, p. 24/100
20/71 decode the encoded video data.
[0052] The receiving device can then send the decoded video signal to a rendering device (for example, a video display device, playback device or other suitable processing device). Rendering devices include, for example, head-mounted monitors, virtual reality television and other 180 or 360 degree display devices. Generally, a head-mounted monitor is able to track the user's head movement and / or the user's eye movement. The head-mounted monitor can use the tracking information to render the portion of a 360-degree video that corresponds to the direction in which the user is looking, so that the user can experience the virtual environment the same way they would in the real world. A rendering device can render a video at the same frame rate at which the video was captured or at a different frame rate.
[0053] Projections and mappings are used to represent three-dimensional (3-D) surfaces on two-dimensional (2D) maps. For example, in 360-degree video applications, projections and mappings can be used to map a 360-degree video frame, which captures pixels in all directions of the camera and can be viewed as a sphere in a two-dimensional video frame. Examples of two-dimensional projections include an equirectangular projection (ERP) and a cube map projection (CMP), among others. Figure IA illustrates a video frame 110 that includes a projection
Petition 870190092083, of 16/09/2019, p. 25/100
21/71 equirectangular of a 360 degree video frame. An equirectangular projection maps points on a sphere to a two-dimensional map by linearly mapping the latitude and longitude of the points on the sphere to coordinates (x, y) in the video frame 110. The equirectangular projection is able to include all pixels from the frame 360-degree video frame the two-dimensional video frame 110, so the transitions from one area of the video frame 110 to another are seamless. Perfect transitions mean that an equirectangular video frame can encode efficiently, in terms of the size of the encoded video frame. This is because operations such as motion estimation and motion compensation produce better results when the movement between video frames appears continuous.
[0054] Figure 1B illustrates a video frame 120 which includes a cube mapping projection of a 360 degree video frame. The cube mapping projection projects points on the surface of a sphere into points on planes that are tangent to the sphere's surface. That is, the pixels are embedded in the six faces of a cube, where the height, width and length of the cube can be such that the cube fits into the sphere. The example in Figure 1B is a 3 x 2 arrangement; that is, three crossed cube faces and two high cube faces. The 3 x 2 arrangement results in an aspect ratio that can efficiently code. For example, less data per line of pixels needs to be stored than if an arrangement like 1 x 2 is used.
[0055] In the example video frame 120 of
Petition 870190092083, of 16/09/2019, p. 26/100
22/71
Figure IB, a cube face that can be considered a front front face 122 is placed in the middle of the upper half of the video frame 120. The cube faces right and left from the front face (for example, a right face 124 and left face 126) are placed on the right and left sides, respectively, of the upper half of the video frame 120. The cube face that can be considered the rear face 128 is rotated at -90 degrees and placed in the center of the lower half of the frame video 120. The cube face that can be considered the upper or upper face 130 is placed to the right of the rear face 128, and is also rotated so that the end of the upper face 130 coincides with the end of the rear face 128. A face of the cube that can be considered the bottom or bottom face 132 is placed to the left of the rear face 128, rotated to match the edge of the rear face 128.
[0056]
In the example in Figure 1B, the pixels included in the front face 122 were selected as the view to be placed directly in front of the observer. In other examples, a different part of the video data can be selected to be the front view. Additionally, the arrangement of the cube faces illustrated in the example video frame 120 of Figure 1B is an example arrangement. Other arrangements are also possible.
[0057]
A cube mapping projection can be more compact than an equirectangular projection, due to some compression of pixel data that occurs at the edges of the cube faces. Cube mapping also produces less image distortion, which can improve
Petition 870190092083, of 16/09/2019, p. 27/100
23/71 coding efficiency.
[0058] Another projection is one called the segmented sphere projection (SSP). Segmented sphere projection is described in Y. Ye, E. Alshina, and J. Boyce, Algorithm descriptions of projection format conversion and video quality metrics in 360Lib, JVETE1003, January 2017 (hereinafter JVET-E1003), which is incorporated herein by reference, in its entirety and for all purposes. Figure 2A illustrates the segmented sphere projection from the surface of a sphere 202 for an example of vertical two-dimensional mapping 210 generated according to the segmented sphere projection. The segmented sphere projection divides the sphere into three segments: a north pole region 204, a south pole region 208, and an equatorial region 206. The north pole and south pole regions are also referred to here as sphere poles or segments ball pole. In the illustrated example, the three segments are divided into a latitude of 45 degrees north and 45 degrees south (for example, as measures of the center of sphere 202). In other examples, the three segments can be divided into a different degree of latitude.
[0059] In the two-dimensional mapping example 210 shown in Figure 2A, the area covered by the north pole region 204 is mapped to a first circular region, which will be referred to as a top view 214. Similarly, the area covered by the south pole region 208 is mapped to a second circular region, which will be referred to as a bottom view 218. In this example, bottom view 218 is placed on mapping 210 next
Petition 870190092083, of 16/09/2019, p. 28/100
24/71 a, and below top view 214. Top view 214 and bottom view 21 are also labeled Face 0 and Face 1, respectively. Equatorial region 206 is divided into four equal segments, and each segment is mapped to a square area, which are placed on map 210 below each other, below bottom view 218. For the purposes of this example, the square areas for the equatorial region 206, from top to bottom, will be referred to as the left view 216a, the front view 216b, the right view 216c, and the rear view 216d or Face 2, Face 3, Face 4, and Face 5, respectively. The numerical labels for the left view 216a, front view 216b, right view 216c, and back view 216d have been rotated -90 degrees to illustrate the orientation of pixels placed in this view; in this example, four all four north of the view are oriented to the right and south to the left. In other examples, the left, right, front, and rear views can be arranged in different orders and with different north-south orientations than what is illustrated here. In other examples, the areas where the 206 equatorial region is mapped may not be square. For example, when an angle except 45 degrees is used to outline the polar regions, rectangular areas that are non-square can better fit pixel data and can result in less distortion than if, in this example, the data were mapped to square areas .
[0060] In a video application, the pixels of each of the north pole 204 region and the south pole 208 region can be mapped to the regions
Petition 870190092083, of 16/09/2019, p. 29/100
25/71 circulars of the top view 214 and the bottom view 218, respectively, using an angular projection commonly known as a fisheye projection. In this example, the diameter of the circular regions in each of the top view 214 and the bottom view 218 is the same as the border of each of the equatorial regions, due to each view covering 90 degrees of latitude. Each of the left view 216a, front view 216b, right view 216c, and rear view 216d can be generated using the projection used to generate the equirectangular projection, which can result in relatively smooth transitions between these views.
[0061] Figure 2B is a diagram illustrating an alternative mapping 220 for faces or views that can be generated using segmented sphere mapping. In the example in Figure 2B, the views are arranged in a 3x2 shape, that is, three crossed faces and two high faces. In this mapping 220, front view 216b is placed in the middle of the top half of mapping 220. Left view 216a and right view 216c are placed to the left and right, respectively, of front view 216b. The rear view 216d is rotated at -90 degrees and placed in the middle of the lower half of the mapping 220. The top view 212 is also rotated such that the left edge of the top view is aligned with the right edge of the rear view 216d, and placed on the right from the rear view 216d. The bottom view 218 is also rotated, so that the right edge of the bottom view 218 aligns with the left edge of the back view 216d, and is
Petition 870190092083, of 16/09/2019, p. 30/100
26/71 placed to the left of the rear view 216d. In this example, aligning means that at least a few pixels from each view that would be adjacent in the original sphere 202 are adjacent in mapping 220. In this example, the corner areas of the top view 212 and bottom view 218 that are outside the eye projection fish are filled with a gray color. In other examples, these corner areas can be filled with another color.
[0062] Various techniques can be used to map the equatorial region of the segmented sphere mapping to one or more regions of a two-dimensional video frame. For example, an equirectangular projection can be used, or a cube mapping projection. These projections can cause an undesirable amount of distortion in the video frame. For example, an equirectangular projection extends the polar regions across the width of the projection, as well as compressing these areas. As another example, the projection of cube mapping can result in non-linear transitions between the faces of the cube, so that a boundary between the faces of the cube is visible.
[0063] These and other distortions, in addition to resulting in visible defects when a video frame is rendered, can reduce the efficiency of the encoding. For example, some video compression algorithms look for continuous movement between video frames and / or blocks in a video frame that are visually similar to other blocks in the same video frame or another video frame. The distortion
Petition 870190092083, of 16/09/2019, p. 31/100
27/71 in a video frame can result in what must be a continuous movement that appears discontinuous. In addition, or alternatively, blocks that were similar in the original 360-degree video can be distorted so that the pixels in the blocks are no longer similar. These and other problems can reduce the ability of video compression algorithms to
code efficiently one picture of video two-dimensional, resultinglarger tablet. in a stream of bits [ 0064] In several implementations, are
systems and methods are provided for processing 360-degree video data, using a segmented sphere projection, which avoids the problems discussed above. In various implementations, segmented sphere projection can be used to map a 360-degree video frame to a two-dimensional, rectangular shape, which may be easier to handle video transmitters and receivers. When generating this mapping, a projection of equal cylindrical area can be used to map the equatorial region of the segmented sphere projection to a two-dimensional representation. Cylindrical equal area projections can result in less distortion in a video frame. Reducing distortion can increase encoding efficiency over projections that produce more distorted video frames. Increased encoding efficiency can result in better compression, and lower encoded bit streams.
[0065] Several projections of equal area
Petition 870190092083, of 16/09/2019, p. 32/100
28/71 cylindrical can be used to map the equatorial segment of the segmented sphere pole projection to a two-dimensional shape. For example, a video encoding system can apply a projection of equal cylindrical Lambert area to perform the mapping. Lambert's cylindrical equal area projection is one of a class projections for projecting spherical shapes into two-dimensional shapes, where the two-dimensional shape has no distortion along the sphere's equator and distortion that increases between the equator and the poles. Projections of equal area preserve the area of the sphere, at the expense of visual distortion in the polar regions. Other cylindrical equal area projections include Behrmann, Gall-Peters and others, any of which can be used to convert a 360-degree video frame into a two-dimensional format. Lambert's projection provides the simplest formulas, and has been shown to result in better coding efficiency than at least some more complex equal-area cylindrical designs.
[0066] Figure 3 is a diagram illustrating an example of applying Lambert's cylindrical equal area projection to the equatorial region 306 of a 302 sphere. The 302 sphere in this example was segmented according to the segmented sphere projection, and so includes a north pole region 304 and a south pole region 308 in addition to the equatorial region 306. In the example illustrated in Figure 3, the latitude at which the north pole region 304 and the south pole region 308 are
Petition 870190092083, of 16/09/2019, p. 33/100
29/71 ± sin ^-1 (-) «± 41.81 ° outlined is in ' ^3z , which was chosen so that the equatorial region 306 includes two thirds of the total sphere area and each polar segment includes one-sixth of the sphere area .
[0067] Figure 3 illustrates, by way of example, a cylinder 310 in which the pixels of sphere 302 can be mapped. Cylinder 310 can be unwound or placed flat and divided into the four faces used in the projection of sphere 302 for two-dimensional mapping. As a result of equatorial region 306 including two-thirds of the sphere area, when equatorial region 306 is mapped to faces in a two-dimensional mapping (see, for example, Figure 2A and Figure 2B), the number of samples on the equatorial faces is also equal to two thirds of the samples in the two-dimensional mapping. Alternatively, in some examples, the equatorial region 306 illustrated in Figure 3 can be mapped to rectangular faces to preserve the ratio. Examples of projections are described in Aleksandar M. Dimitrijevic, Martin Lambers and Dejan D. Rancic, Comparison of spherical cube map projections used in planet-sized terrain rendering, Facta Universitatis (NI S), Ser. Math. Inform., Vol. 31, No. 2 (2016), 259-297, which is incorporated herein by reference, in its entirety and for all purposes.
[0068] As discussed above, the polar segments (for example, the north pole region 304 and the south pole region 308) can be mapped to the disk shape or circular in the two-dimensional mapping of the
Petition 870190092083, of 16/09/2019, p. 34/100
30/71 sphere 302. When mapped to disks, the samples in the two-dimensional mapping for the polar segments are smaller than one third of the samples in the two-dimensional mapping. When polar segments are mapped to be expanded into square faces, as discussed above, each polar segment can include one sixth of the samples in the two-dimensional mapping.
[0069] Mapping a 360-degree video frame to a two-dimensional rectangular format involves converting the three-dimensional space of the 360-degree video data to the two-dimensional space of the output video frame. Performing this conversion may include selecting a pixel location, (m, n), in the output video frame, and determining a point _{in the} spherical video frames. A pixel sample can be taken from the point designated by and placed at the point (m, n) on the output video frame.
[0070] In some examples, the north pole region 304 and the south pole region 308 can be mapped using an angular fisheye projection, which can also be described as a circular pole mapping. Using a fisheye projection, the polar regions can be mapped into rectangular areas of a video frame while maintaining a circular shape.
[0071] The following equations can be used to map the north pole 304 region (for example, Face 0) for two-dimensional mapping:
Petition 870190092083, of 16/09/2019, p. 35/100
31/71
[0072] The following equations can be used for the south pole region 308 (for example, Face 1) for two-dimensional mapping:
(A — l
--n
- ^ Ã 0) m-i - / /
[0073] The following equations illustrate an example application of the Lambert cylindrical equal area projection for equatorial region 306. In this example, equatorial region 306 can be mapped to four square regions, identified by f = 2 ... 5 (for example ,
Faces 2, 3, 4, and 5) using the following equations:
(, ¹ m + -
---- + f - 2 (5) „. -1 (2 ( _Λ 2 (, 1λ \ θ = sin - 1 - n + - (6) 3 A 27 //
Petition 870190092083, of 16/09/2019, p. 36/100
32/71 [0074] In equations (5) and (6), the yaw (for example, the horizontal angle) is in the range of φ £ Υ + σ-2) ϊ-ϊ + σ-2):], depending which face f = 2 ... 5 is being mapped, and pitch (for example, the vertical angle) is in the range of Θ £ | - sin ^-1 , sin ^-1 [0075] Figure 4 illustrates an example of a video frame 420 which was mapped from 360 degree video data, using the above equations and a 3 x 2 arrangement, as previously discussed. In this example, video frame 420, Face 2, Face 3, and Face 4, which can be termed as a left view 416a, a front view 416b, and a right view 416c, were placed close together in the upper half of the video frame 420. Left view 416a, front view 416b, and right view 416c can thus form a region where pixels appear to smoothly transition between views. In the lower half of the video frame 420, Face 5, which can be referred to as the rear view 416d, was rotated -90 degrees and was placed between Face 1 (the bottom view 418) and Face 0 (the top view 412). The bottom view 418 and the top view 412 have also been rotated to align with the edges of the rear view 416d. Rotation of bottom view 418, rear view 416d, and top view 412 results in continuous pixels at least where bottom view 418 is adjacent to rear view 416d, and where rear view 416d is adjacent to top view 412.
[0076] As discussed above, a projection of
Petition 870190092083, of 16/09/2019, p. 37/100
33/71 fisheye results in pixels from the north and south pole regions, each occupying circular areas within the square areas in which the pixels are mapped. The fisheye projection is able to preserve most of the data from the spherical video data, although some losses may occur due to the pixels being distorted in a circular shape. In addition, square regions have corner areas where pixels are filled with gray or some other value, instead of pixel data from spherical video data. When encoded, the corner areas can reduce the encoding efficiency, due to having non-video data. In addition, the corner areas add unnecessary data, since the data from the corner areas will be discarded when the video frame is rendered for display.
[0077] In some examples, circular polar data can be mapped into the square areas of the video frame using a square to circle conversion. When the video frame is rendered for display, a video encoding system can use a circle-to-square conversion to reconstruct the polar regions.
[0078] Figure 5 is a diagram illustrating an example of mapping a circle 502 to a square 504 and a square 504 to a circle 502. Several techniques can be used to perform these mappings, some of which are described in M. Lambers, Mappings between Sphere, Disc, and Square, Journal of Computer Graphics Techniques, Vol. 5, No. 2, 2016, which is hereby incorporated by reference, in its
Petition 870190092083, of 16/09/2019, p. 38/100
34/71 wholly and for all purposes. For example, Fernández-Gausti squircle mapping, elliptical arc mapping or another mapping can be used. The use of circle-to-square and square-to-circle conversions to project 360-degree video data into a two-dimensional format and from a two-dimensional format to a 360-degree representation is discussed in more detail in US Order No. 173521), filed, which is incorporated herein by reference in its entirety.
[0079] Figure 6 illustrates an example of a 620 video frame that was mapped from 360 degree video data using an equal cylindrical area projection for the equatorial region and a circle-to-square mapping for the polar regions. The 620 video frame example organizes the different faces or views with three crosses and two highs. As in the example in Figure 4, in Figure 6, Face 2, Face 3, and Face 4, which can be termed as a left view 616a, a front view 616b, and a right view 616c, were placed next to each other in half top of the 620 video frame. The bottom half of the 620 video frame, Face 5, which can be referred to as the rear view 616d, was rotated at -90 degrees and was placed between Face 1 (the bottom view 618) and Face 0 (the top view 612).
[0080] In this example, the polar regions of the bottom view 618 and the top view 612 have been expanded to fill the square areas of the video frame 620 where the polar regions have been mapped. Bottom view 618 and top view 612 have been rotated
Petition 870190092083, of 16/09/2019, p. 39/100
35/71 to align with the edges of the 616d rear view. As a result, the pixels through bottom view 618, rear view 616d, and top view 612 are almost continuous. In some examples, a small amount of distortion may appear where bottom view 618 meets rear view 616d and / or where top view 612 meets rear view 616d.
[0081] When expanding the circular polar regions into square areas of the video frame, it is no longer necessary to fill the bottom view 618 and the top view 612 with pixel data that can decrease the coding efficiency and which will be discarded when the frame of the video 620 video is rendered for display. The arrangement of bottom view 618, rear view 616d, and top view 612 on the bottom half of video frame 620 results in an almost continuous region. Smooth transitions between each view are desirable because encoding the video frame can result in a more compact encoded representation than when transitions are abrupt. In other examples, other arrangements of the views can be used, such as a 1 x 6 arrangement or a 6 x 1 arrangement. Alternatively or additionally, the top and bottom views can be placed on the top or bottom of the 620 video frame, left or right, or somewhere else in the 620 video frame. Alternatively or in addition, other rotations of the top and bottom views can be applied before the top and bottom views are mapped to the video frame, to reach different regions almost continuous.
Petition 870190092083, of 16/09/2019, p. 40/100
36/71 [0082] Once mapped to a two-dimensional format, the video frame can be encoded for storage and transport. The video frame can also be mapped back to a three-dimensional spherical representation and then viewed using a 360-degree video playback device.
[0083] To produce a spherical representation from the two-dimensional mapping of a video frame, a video encoding system can perform a three-dimensional to two-dimensional conversion. Performing this conversion can include selecting a point on the sphere 'and determining a corresponding point (m, n) in two-dimensional mapping. A pixel can then be experienced from the point in two-dimensional mapping, and placed at the point on the sphere. In the following equations, the dimension of each face is considered to be A x A.
[0084] The following equations can be used to map the top view (for example, Face 0) to the north pole region:

with g _E ( _s j _n 1 Q) Ξ φ (—π, π].
[0085] The following equations can be used to map the bottom view (for example, Face 1) to the south pole region:
Petition 870190092083, of 16/09/2019, p. 41/100
37/71

(9) (10) [0086] The left, front, right, and rear views, identified by f = 2. . . 5, respectively which include the equatorial area of the video frame can be mapped to the equatorial region of the sphere using the following equations:
m = - A + (4 - f) A - - (ii) n2
A / Q 1 n = - (1 —sign Θ) - (12) 2 7 2 [0087] In equations (11) and (12), the yaw (for example, the horizontal angle) is in the range of φ £ (-π + (/ -20, - ^ + (/ - 2) 5 ^v zz zj depending on which face f = 2
... 5 is being mapped, and the pitch (for example, the vertical angle) is in the range of Θ E sin ^-1 , sin ^-1 (“)] · [0088] Using the projection discussed above to map the frames of 360-degree video for two-dimensional mappings can improve the encoding efficiency of 360-degree video. For example, according to the common test conditions described in J. Boyce, E. Alshina, A. Abbas, Y. Ye, JVET common test conditions and evaluation procedures for 360-degree video, JVET
Petition 870190092083, of 16/09/2019, p. 42/100
38/71
E1030, which is incorporated by reference, in its entirety and for all purposes, the coding gain when using the mapping illustrated in Figure 4 is -11.4%.
[0089] Figure 7 illustrates an example of a process 700 for processing video data according to the techniques discussed above. In 702, process 700 includes obtaining 360 degree video data including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame. In some examples, 360 degree video data can be obtained directly from a video capture device. In these examples, the spherical representation can include multiple images that have been captured simultaneously, such as multiple rectangular images or one or more fisheye images. Alternatively or additionally, the 360 degree video data may include video frames in which multiple images have been joined by the video capture device or another device. In some examples, 360-degree video data obtained in a rectangular shape (for example, an equirectangular shape or cube mapping) can be mapped to a spherical representation.
[0090] In 704, process 700 includes segmenting a video frame from the plurality of video frames in an upper region, an intermediate region, and a lower region. The upper region includes a first circular area of the representation
Petition 870190092083, of 16/09/2019, p. 43/100
39/71 spherical. The lower region includes a second circular area of the spherical representation that is opposite in the spherical representation from the first circular area. The middle region includes an area of the spherical representation not included in the upper or lower region. The video frame can be segmented at a first latitude above an equator of the spherical representation and a second latitude below the equator. The first latitude and the second latitude can be equidistant from the equator. In some examples, the latitude angle is 41.81 degrees from the equator. In other examples, the angle of the latitudes is greater or less than 41.81 degrees. In some examples, the middle region includes two thirds of the spherical representation area.
[0091] In 706, process 700 includes mapping, using a projection of equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame. The cylindrical equal area projection can be, for example, Lambert's cylindrical equal area projection. Mapping the middle region can include, for example, selecting a pixel location in the output video frame and determining a point in the spherical representation corresponding to the pixel location. In this example, the point in the spherical representation can be determined using a mapping to convert a two-dimensional rectangle to a three-dimensional sphere, such as an equirectangular projection. Mapping the middle region can also include sampling a pixel at the point in the representation
Petition 870190092083, of 16/09/2019, p. 44/100
40/71 spherical, and assign the sampled pixel to the pixel location in the video frame. Using an equal cylindrical area projection to map the middle region preserves the middle region area when the middle region is mapped to the output video frame. By preserving the area, the aspect ratio of the intermediate region can be modified. In contrast, a projection such as an equal area projection preserves the aspect region of the intermediate region while changing the area. Preserving the area of the middle region can improve the coding efficiency instead of preserving the aspect ratio.
[0092] In some examples, the middle region includes parts that can be designated a left view, a front view, and a right view. In these examples, the part designated as the left view can be placed on the output video frame adjacent to the part designated as the front view. Additionally, the part designated as the right view is placed adjacent to the front view. In these examples, the left, front, and right views can form a continuous area in the output video frame, where continuous means that pixels that are adjacent in the spherical representation are placed adjacent to each other in the output video frame.
[0093] In some examples, the middle region includes a part that can be designated as a rear view. In these examples, the lower region can be placed on the output video frame adjacent to the part designated as the rear view, and the
Petition 870190092083, of 16/09/2019, p. 45/100
41/71 upper region can also be placed adjacent to the rear view. In these examples, the lower region and the upper region can form an area in the output video frame that is substantially continuous.
[0094] In some examples, process 700 even includes mapping the upper region in the output video frame. The upper region can be mapped using an angled fisheye projection and / or a projection that converts a circular area into a square area. In these examples, process 700 also includes mapping the lower region in the output video frame. The lower region can be mapped using an angled fisheye projection and / or a projection that converts a circular area into a square area.
[0095] In some examples, the output video frame has a three to two aspect ratio. A three-by-two aspect ratio can code more efficiently than other aspect ratios. In some instances, the output video frame may be encoded, using, for example, the HEVC or AVC codec (or another codec) for storage and / or transmission.
[0096] Figure 8 illustrates an example of a process 800 for processing video data according to the techniques discussed above. In 802, process 800 includes obtaining 360-degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame. In some examples, 360-degree video data can be obtained from a bit stream
Petition 870190092083, of 16/09/2019, p. 46/100
42/71 coded. The encoded bit stream may have been read from a storage location and / or may have been received from a transmission. In these examples, the bit stream can be decoded into rectangular video frames.
[0097] In 804, process 800 includes identifying one or more rectangular areas of a video frame from the plurality of video frames. The one or more rectangular areas can include, for example, a left view, a front view, a right view, and / or a rear view. In some examples, the one or more rectangular areas include two-thirds an area of the video frame.
[0098] In 806, process 800 includes mapping, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a region lower, where the intermediate region is located between the upper region and the lower region. The cylindrical equal area projection can be, for example, Lambert's cylindrical equal area projection. Mapping to one or more rectangular areas may include, for example, selecting a point in spherical representation, and determining a pixel location in the video frame that corresponds to the point. The pixel location can be determined using a mapping to convert a three-dimensional sphere to a two-dimensional rectangle, such as an equirectangular projection, a cube mapping projection, or
Petition 870190092083, of 16/09/2019, p. 47/100
43/71 another projection. Mapping to one or more additional rectangular areas can also include sampling a pixel from the pixel location, and assigning the sampled pixel to the point in the spherical representation.
[0099] In some examples, process 800 may also include mapping a first rectangular area of the video frame in the upper region, and mapping a second rectangular area of the video frame in the lower region. The first rectangular area and / or the second rectangular area can be mapped using an angled fisheye projection and / or a projection to convert a square area to a circular area.
[0100] The upper region can include, for example, a surface of the spherical representation that is above a first latitude of the spherical representation. As another example, the lower region can include a surface of the spherical representation below a second latitude of the spherical representation. In this example, the first latitude and the second latitude can be equidistant from an equator of the spherical representation. In some instances, latitudes are 41.81 degrees from the equator. In some instances, latitudes are degrees greater than or less than 41.81 degrees.
[0101] In some examples, the video frame has a three-to-two aspect ratio. In these examples, the video frame can include two lines of three views or faces.
[0102] In some examples, the one or more rectangular areas include areas that can be designated as a left view, a front view and
Petition 870190092083, of 16/09/2019, p. 48/100
44/71 a right view. In these examples, the area designated as the left view can be located adjacent to the area designated as the front view, and the area designated as the right view can also be located adjacent to the front view. In these examples, the views
left, front, < and right can form an area in the framework of video. [0103] On some examples, to a or more rectangular areas include an area that can to be
designated as a rear view. In these examples, the first rectangular area can be adjacent to the area designated as the rear view, and the second rectangular area can also be adjacent to the rear view. In these examples, the first rectangular area, the rear view, and the second rectangular area can form a continuous area in the video frame.
[0104] In some examples, processes 700, 800 can be performed by a computing device or an apparatus, such as a video encoding device. A video encoding device can include, for example, a video encoding system and / or a video decoding system. In some cases, the computing device or device may include a processor, microprocessor, microcomputer, or other component of a device configured to perform process steps 700, 800. In some instances, the computing device or device may include a camera configured for capture video data (for example, a video stream) including video frames. For example, the
Petition 870190092083, of 16/09/2019, p. 49/100
45/71 computing device can include a camera device (for example, an IP camera or other type of camera device) that can include a video codec. In some instances, a camera or other capture device that captures video data is separate from the computing device, in which case the computing device receives the captured video data. The computing device may further include a network interface configured to communicate video data. The network interface can be configured to communicate data based on Internet protocol (IP).
[0105] Processes 700, 800 are illustrated as logic flow diagrams, whose operation represents a sequence of operations that can be implemented in hardware, computer instructions or a combination thereof. In the context of computer instructions, operations represent computer executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer executable instructions include routines, programs, objects, components, data structures and the like that perform specific functions or implement specific data types. The order in which the operations are described is not intended to be interpreted as a limitation, and any number of the operations described can be combined in any order and / or in parallel to implement the processes.
Petition 870190092083, of 16/09/2019, p. 50/100
46/71 [0106] Additionally, processes 700, 800 can be performed under the control of one or more computer systems configured with executable instructions and can be implemented as code (for example, executable instructions, one or more computer programs or one or more applications) running collectively on one or more processors, by hardware or combinations thereof. As noted above, the code can be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors. The computer or machine-readable storage medium may be non-transitory.
[0107] A video encoding system, including an encoding system and / or a decoding system can be used to encode and / or decode video data. An example of a video encoding and decoding system includes a source device that provides encoded video data to be decoded later by a destination device. In particular, the source device provides video data to the destination device via a computer-readable medium. The source device and the target device can include any of a wide range of devices, including desktop computers, notebook computers (ie, laptop), tablet computers, decoder, telephone devices, such as so-called smart phones. , so called pillows
Petition 870190092083, of 16/09/2019, p. 51/100
47/71 smart phones, television, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some cases, the source device and the destination device may be equipped for wireless communication.
[0108] Video data captured by a camera (for example, a fisheye camera, or other suitable camera or cameras) can be encrypted to reduce the amount of data needed for transmission and storage. Encoding techniques can be implemented in an example of a video encoding and decoding system. In some examples, a system includes a source device that provides encoded video data to be decoded later by a destination device. In particular, the source device provides video data to the destination device via a computer-readable medium. The source device and the target device can include any of a wide range of devices, including desktop computers, notebook computers (ie, laptop), tablet computers, decoders, telephone devices, such as so-called phones smart phones, so-called smart pads, television, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some cases, the source device and the target device may be equipped to
Petition 870190092083, of 16/09/2019, p. 52/100
48/71 wireless communication.
[0109] The target device can receive the encoded video data to be decoded via the computer-readable medium. The computer-readable medium may comprise any type of medium or device capable of moving the encoded video data from the source device to the destination device. In one example, the computer-readable medium may comprise a communication medium to allow the source device to transmit encoded video data directly to the destination device in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the destination device. The communication medium can comprise any wireless or wired communication medium, such as a radio frequency (RE) spectrum or one or more physical transmission lines. The communication medium can be part of a packet-based network, such as a local network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from the source device to the destination device.
[0110] In some examples, encrypted data can be sent from the output interface to a storage device. Likewise, encrypted data can be accessed from the storage device via the input interface.
Petition 870190092083, of 16/09/2019, p. 53/100
49/71 storage device can include any of a variety of data storage media distributed or accessed locally, such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other digital storage media suitable for storing encoded video data. In another example, the storage device can correspond to a file server or other intermediate storage device that can store the encoded video generated by the source device. The target device can access the stored video data from the storage device via streaming or downloading. The file server can be any type of server capable of storing encoded video data and transmitting that encoded video data to the target device. Examples of file servers include a Web server (for example, for a website), an FTP server, network-attached storage devices (NAS), or a local disk drive. The target device can access the encoded video data through any standard data connection, including an Internet connection. This can include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, DSL, cable modem, etc.) or a combination of both that is suitable for accessing stored encoded video data on a file server. The transmission of encrypted video data from the storage device can be a stream transmission
Petition 870190092083, of 16/09/2019, p. 54/100
50/71 streaming, a download transmission or a combination of them.
[0111] The techniques of this disclosure are not necessarily limited to wireless applications or configurations. The techniques can be applied to video encoding in support of any variety of multimedia applications, such as satellite television broadcasts, cable television broadcasts, satellite television broadcasts, streaming video, as well as streaming dynamic adaptable over HTTP (DASH), digital video encoded on a data storage medium, decoding digital video stored on a data storage medium or other applications. In some examples, the system can be configured to support unidirectional or bidirectional video transmission to support applications such as video transmission, video playback, video broadcasting and / or video telephony.
[0112] In one example, the source device includes a video source, a video encoder and an output interface. The target device may include an input interface, a video decoder and a display device. The video encoder of the source device can be configured to apply the techniques disclosed here. In other examples, a source device and a target device may include other components or arrangements. For example, the source device can receive video data from an external video source, such as an external camera. Likewise, the target device can
Petition 870190092083, of 16/09/2019, p. 55/100
51/71 interact with an external display device, instead of including an integrated display device.
[0113] The example system above is merely an example. Techniques for processing video data in parallel can be performed by any digital video encoding and / or decoding device. Although the techniques of this disclosure are generally performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, typically referred to as a CODEC. In addition, the techniques of this disclosure can also be performed by a video preprocessor. The source device and the destination device are merely examples of such encoding devices in which the source device generates encoded video data for transmission to the destination device. In some examples, the source and destination device may operate in a substantially symmetrical manner, such that each of the devices includes video encoding and decoding components. Thus, examples of systems can support unidirectional or bidirectional video transmission between video devices, for example, for video transmission, video playback, video broadcasting or video telephony.
[0114] The video source may include a video capture device, such as a video camera, a video file containing previously captured video and / or a video interface for receiving
Petition 870190092083, of 16/09/2019, p. 56/100
52/71 video from a video content provider. As an additional alternative, the video source can generate computer-based data such as the source video or a combination of live video, archived video and computer generated video. In some cases, if the video source is a video camera, the source device and the destination device may form so-called camera phones or video phones. As mentioned above, however, the techniques described in this disclosure may be applicable to video encoding in general, and can be applied to wireless and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder. The video information
coded can then be sent through the interface in exit to the middle computer readable. [0115] As noted, the middle readable per computer can include media temporary, as streaming without wire or network with thread, or media in storage (ie storage media storage not transitory) , such as hard disk, drive flash, CD,
digital video, Blu-ray discs or other computer-readable media. In some examples, a network server (not shown) can receive encoded video data from the source device and provide the encoded video data to the destination device, for example, via network transmission. Likewise, a computing device in a media production facility, such as a disk embossing facility, can receive encoded video data from the device
Petition 870190092083, of 16/09/2019, p. 57/100
53/71 source and produce a disc containing the encoded video data. Therefore, the computer-readable medium can be understood to include one or more computer-readable media in various ways, in various examples.
[0116] A person of ordinary skill will assess that symbols or terminology less than (<) and greater than (>) used here can be replaced with less than or equal to (<) and greater
of that or equal the (>) symbols, respectively, without if to put away from the scope this description. [ 0117] Details wait specific one device coding 104 and a device in
decoding 112 are shown in Figure 12 and Figure 13, respectively. Figure 12 is a block diagram illustrating an example of coding device 104 that can implement one or more of the techniques described in this disclosure. The encoding device 104 can, for example, generate the syntax structures described herein (for example, the syntax structures of a VPS, SPS, PPS or other syntax elements). The encoding device 104 can perform intra-prediction and inter-prediction of video blocks within video slices. As previously described, intracoding depends, at least in part, on spatial prediction to reduce or remove spatial redundancy within a given frame or video image. Inter-coding depends, at least in part, on temporal prediction to reduce or remove temporal redundancy within the adjacent or adjacent frames of
Petition 870190092083, of 16/09/2019, p. 58/100
54/71 a video sequence. Intra-mode (Mode I) can refer to any of the various compression modes based on space. Inter-modes, such as unidirectional prediction (P mode) or bi-prediction (B mode), can refer to any of several time-based compression modes.
[0118] The coding device 104 includes a partitioning unit 35, prediction processing unit 41, filter unit 63, image memory 64, adder 50, transform processing unit 52, quantization unit 54 and coding unit of entropy 56. The prediction processing unit 41 includes motion estimation unit 42, motion compensation unit 44, and intraprediction processing unit 46. For video block reconstruction, encoding device 104 also includes inverse quantization 58, inverse transform processing unit 60 and adder 62. Filter unit 63 is intended to represent one or more loop filters, such as an unlock filter, an adaptive loop filter (ALF), and an adaptive sample deviation (SAO). Although filter unit 63 is shown in Figure 12 as a loop filter, in other configurations, filter unit 63 can be implemented as a post-loop filter. A post-processing device 57 can perform additional processing on encoded video data generated by the encoding device 104. The techniques of this disclosure can, in some cases, be implemented
Petition 870190092083, of 16/09/2019, p. 59/100
55/71 by the coding device 104. In other cases, however, one or more of the coding techniques of this disclosure can be implemented by the post-processing device 57.
[0119] As shown in Figure 12, the encoding device 104 receives video data, and the partitioning unit 35 partitions the data into video blocks. Partitioning can also include partitioning into slices, slice segments, juxtaposed sections or other larger units, as well as partitioning video blocks, for example, according to a quadtree structure of LCUs and CUs. The encoding device 104 generally illustrates the component that encodes the video blocks within a video slice to be encoded. The slice can be divided into several video blocks (and possibly sets of video blocks called juxtaposed sections). The prediction processing unit 41 may select one of a plurality of possible coding modes, such as one of a plurality of intra-prediction coding modes or one of a plurality of inter-prediction coding modes, for the video block current based on error results (for example, encoding rate and level of distortion, or similar). The prediction processing unit 41 can supply the resulting intra or intercoded block to adder 50 to generate residual block data and to adder 62 to reconstruct the encoded block for use as a reference image.
[0120] The intraPetition processing unit 870190092083, of 16/09/2019, p. 60/100
56/71 prediction 46 inside the processing unit
prediction 41 can perform coding in intra-prediction of the block in video current in relation to one or more blocks neighbors at the same frame or slice of blo current co to be
coded to provide spatial compression. The motion estimation unit 42 and motion compensation unit 44 within the prediction processing unit 41 perform inter-predictive encoding of the current video block in relation to one or more predictive blocks in one or more reference images to provide compression temporal.
[0121] The motion estimation unit 42 can be configured to determine the interpretation mode for a video slice according to a predetermined standard for a video sequence. The default pattern can designate video slices in the sequence as P slices, B slices or GPB slices. The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a prediction unit (PU) from a video block within a current video frame or image relative to a predictive block within a reference image.
[0122] A predictive block is a block that corresponds to the PU of the video block to be encoded in
Petition 870190092083, of 16/09/2019, p. 61/100
57/71 pixel difference terms, which can be determined by adding the absolute difference (SAD), sum of the square difference (SSD) or other difference metrics. In some examples, encoding device 104 can calculate values for sub-pixel positions of reference images stored in image memory 64. For example, encoding device 104 can interpolate values of quarter pixel positions, eighth positions pixel or other fractional pixel positions of the reference image. Therefore, the motion estimation unit 42 can perform a motion search for full pixel positions and fractional pixel positions and produce a motion vector with fractional pixel precision.
[0123] The motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-encoded slice, comparing the position of the PU with the position of a predictive block of a reference image. The reference image can be selected from a first reference image list (List 0) or a second reference image list (List 1), each identifying one or more reference images stored in image memory 64. The motion estimation unit 42 sends the calculated motion vector to entropy coding unit 56 and motion compensation unit.
[0124] Motion compensation, performed by motion compensation unit 44, may involve seeking or generating the predictive block based on the motion vector determined by the estimated motion.
Petition 870190092083, of 16/09/2019, p. 62/100
58/71 movement, possibly making interpellations for subpixel precision. Upon receiving the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in a list of reference images. The encoding device 104 forms a residual video block, subtracting the pixel values from the predictive block from the pixel values of the current video block being encoded, forming pixel difference values. The pixel difference values form residual data for the block and can include luma and chroma difference components. Adder 50 represents the component or components that perform this subtraction operation. The motion compensation unit 44 can also generate elements of syntax associated with the video blocks and the video slice for use by the decoding device 112 in decoding the video blocks of the video slice.
[0125] The intraprediction processing unit 46 can intra-predict a current block, as an alternative to the inter-prediction performed by the motion estimation unit 42 and motion compensation unit 44, as described above. In particular, the intra-prediction processing unit 46 can determine an intra-prediction mode to use to encode a current block. In some instances, the intra-prediction processing unit 46 may encode a current block using various intra-prediction modes, for example, during separate encoding passes, and
Petition 870190092083, of 16/09/2019, p. 63/100
59/71 intra-prediction processing unit 46 (or mode selection unit 40, in some examples) can select an appropriate intra-prediction mode to use from the tested modes. For example, the intra-prediction processing unit 46 can calculate the rate distortion values using a rate distortion analysis for the various intra-prediction modes tested, and can select the intra-prediction mode having the best characteristics. rate distortion between the tested modes. Rate distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, uncoded block that was encoded to produce the encoded block, as well as a bit rate (that is, a number of bits) used to produce the coded block. The intra-prediction processing unit 46 can calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate distortion value for the block.
[0126] In any case, after selecting an intra-prediction mode for a block, the intra-prediction processing unit 46 can provide information indicative of the intra-prediction mode selected for the entropy coding unit 56 in block. The entropy coding unit 56 can encode the information indicating the selected intraprediction mode. The encoding device 104 may include encoding contexts in the configuration data settings of transmitted bit streams
Petition 870190092083, of 16/09/2019, p. 64/100
60/71 for various blocks, as well as indications of a more likely intra-prediction mode, an intra-prediction mode index table and a modified intra-prediction mode table to use for each of the contexts. The bitstream configuration data can include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables).
[0127] After the prediction processing unit 41 generates the predictive block for the current video block through inter-prediction or intra-prediction, the encoding device 104 forms a residual video block by subtracting the predictive block from the current video. Residual video data in the residual block can be included in one or more UTs and applied to transform processing unit 52. Transform processing unit 52 transforms residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transformation. Transform processing unit 52 can convert residual video data from a pixel domain to a transformation domain, such as a frequency domain.
[0128] Transform processing unit 52 can send the resulting transform coefficients to the quantization unit
Petition 870190092083, of 16/09/2019, p. 65/100
61/71.
The quantization unit 54 quantifies the transform coefficients to further reduce the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting a quantization parameter. In some examples, the quantization unit 54 can then scan the matrix including the quantized transform coefficients.
Alternatively, the entropy coding unit can perform scanning.
[0129]
After the entropy quantization the entropy coding unit encodes the quantized transform coefficients. For example entropy coding unit 56 can perform context adaptive variable length coding (CAVLC), adaptive context binary arithmetic coding (CABAC), syntax based adaptive context arithmetic coding (SBAC), partitioning entropy coding probability interval (PIFE) or other entropy coding technique. Following entropy coding by entropy coding unit 56, the encoded bit stream can be transmitted to decoding device 112 or archived for later transmission or retrieval by decoding device 112. Entropy coding unit 56 can also encode motion vectors and other syntax elements for the current video slice being encoded.
Petition 870190092083, of 16/09/2019, p. 66/100
62/71 [0130] The inverse quantization unit 58 and inverse transform processing unit 60 apply reverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of an image of reference. The motion compensation unit 44 can calculate a reference block by adding the residual block to a predictive block of one of the reference images within a list of reference images. The motion compensation unit 44 can also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Adder 62 adds the reconstructed residual block to the motion compensation prediction block produced by motion compensation unit 44 to produce a reference block for storage in image memory 64. The reference block can be used by the motion estimation unit motion 42 and motion compensation unit 44 as a reference block to inter-predict a block in a subsequent video frame or image.
[0131] In this way, the encoding device 104 of Figure 12 represents an example of a video encoder configured to generate syntax for an encoded video bit stream. The encoding device 104 can, for example, generate sets of VPS, SPS and PPS parameters as described above. The coding device 104 can perform any of the techniques described herein, including the processes
Petition 870190092083, of 16/09/2019, p. 67/100
63/71 described above. The techniques of this disclosure have generally been described in relation to the coding device 104, but as mentioned above, some of the techniques of this disclosure can also be implemented by the postprocessing device 57.
[0132] Figure 13 is an example of decoding device 112. Decoding device 112 includes an entropy decoding unit 80, prediction processing unit 81, reverse quantization unit 86, reverse transform processing unit 88, adder 90, filter unit 91 and image memory 92 Prediction processing unit 81 includes motion compensation unit 82 and intra prediction processing unit 84. Decoding device 112 may, in some instances, perform a pass generally reciprocal) for the encoding pass described in relation to the encoding device 104 of Figure 12.
[0133] During the decoding process, the decoding device 112 receives an encoded video bit stream that represents video blocks from an encoded video slice and associated syntax elements sent by the encoding device 104. In some embodiments, the decoding device 112 can receive the encoded video bit stream from encoding device 104. In some embodiments, decoding device 112 can receive the encoded video bit stream from a network entity 79, such as a server, a network element with
Petition 870190092083, of 16/09/2019, p. 68/100
64/71 media recognition (MANE), a video editor / processor or other device configured to implement one or more of the techniques described above. The network entity 79 may or may not include the encoding device 104. Some of the techniques described in this disclosure can be implemented by the network entity 79 before the network entity 79 transmits the encoded video bit stream to the decoding device 112. In some video decoding systems, the network entity 79 and the decoding device 112 may be part of separate devices, while in other cases, the functionality described in relation to the network entity 79 can be performed by the same device comprising the decoding device 112.
[0134] The entropy decoding unit 80 of the entropy device of the decoding device 112 decodes the bit stream to generate quantized coefficients, motion vectors and other syntax elements. The entropy decoding unit 80 forwards the motion vectors and other syntax elements to the prediction processing unit 81. The decoding device 112 can receive the syntax elements at the video slice level and / or the block level of video. The entropy decoding unit 80 can process and analyze fixed-length syntax elements and variable-length syntax elements in one or more sets of parameters, such as VPS, SPS and PPS.
[0135] When the video slice is encoded
Petition 870190092083, of 16/09/2019, p. 69/100
65/71 as an intra-encoded slice (I), the intra-prediction processing unit 84 of the prediction processing unit 81 can generate prediction data for a video block of the current video slice based on a mode of signaled intra-prediction and block data previously decoded from the current frame or image. When the video frame encoded as an inter-encoded slice (i.e., B, P or GPB), motion compensation unit 82 of the prediction processing unit 81 produces predictive blocks for a video block of the current video slice with based on motion vectors and other syntax elements received from entropy decoding unit 80. Predictive blocks can be produced from one of the reference images within a list of reference images. The decoding device 112 can build the lists of reference frames, List 0 and List 1, using predefined construction techniques based on reference images stored in image memory 92.
[0136] Motion compensation unit 82 determines the prediction information for a video block from the current video slice by analyzing motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the block of current video being decoded. For example, the motion compensation unit 82 can use one or more elements of syntax in a set of parameters to determine a prediction mode (for example, intra or inter-prediction) used
Petition 870190092083, of 16/09/2019, p. 70/100
66/71 to encode the video blocks of the video slice, a type of inter-prediction slice (for example, slice B, slice P or GPB slice), construction information for one or more reference image lists for the slice , motion vectors for each encoded video block for the slice, inter-prediction status for each encoded video slice for the slice, and other information to decode the video blocks in the current video slice.
[0137] The motion compensation unit 82 can also perform interpolation based on interpolation filters. The motion compensation unit 82 can use interpolation filters as used by the encoding device 104 during the encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case, the motion compensation unit 82 can determine the interpolation filters used by the encoding device 104 from the received syntax elements and can use the interpolation filters to produce predictive blocks.
[0138] The inverse quantization unit 86 quantifies, or dequantifies, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit 80. The inverse quantization process may include the use of a calculated quantization parameter by the encoding device 104 for each video block in the video slice to determine a degree of quantization and, likewise, a degree of quantization
Petition 870190092083, of 16/09/2019, p. 71/100
67/71 inverse that must be applied. The reverse transform processing unit 88 applies a reverse transform (for example, a reverse DCT or other suitable reverse transform), an entire reverse transform, or a conceptually similar reverse transform process, to the transform coefficients to produce residual blocks in the domain of pixel.
[0139] After the motion compensation unit 82 generates the predictive block for the current video block based on motion vectors and other syntax elements, the decoding device 112 forms a decoded video block by adding the residual blocks of the reverse transform processing unit 88 with the corresponding predictive blocks generated by motion compensation unit 82. The adder 90 represents the component or components that perform this sum operation. If desired, loop filters (in the encoding loop or after the encoding loop) can also be used to smooth out pixel transitions or to improve the quality of the video. Filter unit 91 is intended to represent one or more loop filters, such as an unblocking filter, an adaptive loop filter (ALF) and an adaptive sample bypass filter (SAO). Although filter unit 91 is shown in Figure 13 as a loop filter, in other configurations, filter unit 91 can be implemented as a post-loop filter. The video blocks decoded in a given frame or image are then stored in the
Petition 870190092083, of 16/09/2019, p. 72/100
68/71 image memory 92, which stores reference images used for subsequent motion compensation. Image memory 92 also stores decoded video for later presentation on a display device.
[0140] In the previous description, aspects of the application are described with reference to specific modalities of the same, but people skilled in the art will recognize that the invention is not limited to it. Thus, although the illustrative modalities of the application have been described in detail here, it is to be understood that inventive concepts can be otherwise incorporated and used, and that the appended claims are understood to include such variations, except when limited by the prior art. . Various features and aspects of the invention described above can be used individually or together. In addition, the modalities can be used in any number of environments and applications other than those described here without departing from the broader spirit and scope of the specification. The specification and drawings are therefore considered to be illustrative and not restrictive. For purposes of illustration, the methods have been described in a particular order. It should be appreciated that, in alternative modalities, the methods can be performed in a different order than described.
[0141] Where components are described as configured to perform certain operations, such configuration can be performed, for example, by designing electronic circuits or other hardware to perform the
Petition 870190092083, of 16/09/2019, p. 73/100
69/71 operation, programming programmable electronic circuits (for example, microprocessors or other suitable electronic circuits) to perform the operation, or any combination thereof.
[0142] The various illustrative logic blocks, modules, circuits and algorithm steps described in connection with the modalities disclosed here can be implemented as electronic hardware, computer software, firmware or combinations thereof. To clearly illustrate this interchangeability of hardware and software, several illustrative components, blocks, modules, circuits and steps have been described above in general terms in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the particular application and the design restrictions imposed on the system as a whole. Skilled technicians can implement the functionality described in different ways for each particular application, but such implementation decisions should not be interpreted as causing a deviation from the scope of the present invention.
[0143] The techniques described here can also be implemented in electronic hardware, computer software, firmware or any combination thereof. Such techniques can be implemented in any of a variety of devices, such as general purpose computers, wireless communication devices or multi-purpose integrated circuit devices, including application in wireless communication devices and other devices. Any
Petition 870190092083, of 16/09/2019, p. 74/100
70/71 features described as modules or components can be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques can be performed at least in part by a computer-readable data storage medium comprising program code including instructions that, when executed, perform one or more of the methods described above. The computer-readable data storage medium may be part of a computer program product, which may include packaging materials. The computer-readable medium may comprise data or memory storage media, such as random access memory (RAM), such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory ( NVRAM) electrically erasable programmable reading memory (EEPROM), FLASH memory, magnetic or optical data storage media and the like. The techniques additionally, or alternatively, can be carried out at least in part by a computer-readable communication medium that links or communicates program code in the form of instructions or data structures and that can be accessed, read and / or executed by a computer like
propagated signals or[0144] 0 waves. code of the program can to be run by a processor, which can include a or more processors, as a or more processors in
digital signals (DSPs), general purpose microprocessors,
Petition 870190092083, of 16/09/2019, p. 75/100
71/71 application-specific integrated circuits (ASICs), field programmable logic arrays (FPGAs) or other equivalent integrated or discrete logic circuits. Such a processor can be configured to perform any of the techniques described in this disclosure. A general purpose processor can be a microprocessor; but, alternatively, the processor can be any conventional processor, controller, microcontroller or state machine. A processor can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other configuration of this type. Accordingly, the term processor, as used herein, can refer to any of the foregoing structures, any combination of the foregoing structure, or any other structure or apparatus suitable for implementing the techniques described herein. In addition, in some respects, the functionality described here can be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated into a video decoder encoder (CODEC).

权利要求:
Claims (18)
[1]
1. Method for processing video data, comprising:
obtaining 360 degree video data including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame;
target one picture of video from gives plurality of staff video on a upper region, an intermediate region, and a region bottom, the region higher including a first circular area gives spherical representation, the lower region including an
second circular area of the spherical representation that is opposite in the spherical representation from the first circular area, in which the intermediate region includes an area of the spherical representation not included in the upper or lower region; and map, using a projection of equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame.
[2]
2. Method according to claim 1, in which the video frame is segmented at a first latitude above an equator of the spherical representation and a second latitude below the equator, where the first latitude and the second latitude are equidistant from from the equator, where the upper region is above the first latitude, and where the lower region is below the second latitude.
[3]
3. Method according to claim 1, wherein the intermediate region includes two thirds of the area of the
Petition 870190092083, of 16/09/2019, p. 77/100
2/8 spherical representation.
[4]
4. Method according to claim 1, in which mapping the intermediate region includes:
select a pixel location in the output video frame;
determining a point in the spherical representation corresponding to the pixel location, where the point in the spherical representation is determined using a mapping to convert a two-dimensional rectangle into a three-dimensional sphere;
sampling a pixel at the point in the spherical representation; and assign the sampled pixel to the pixel location.
[5]
5. Method according to claim 1, wherein the intermediate region includes a left view, a front view, and a right view, wherein the left view is placed on the output video frame adjacent to the front view, and in that the right view is placed adjacent to the front view.
[6]
A method according to claim 1, wherein the intermediate region includes a rear view, where the lower region is placed on the output video frame adjacent to the rear view, and where the upper region is placed adjacent to the view rear.
[7]
7. Method according to claim 1, further comprising:
map the upper region in the output video frame; and
Petition 870190092083, of 16/09/2019, p. 78/100
3/8 map the lower region in the output video frame.
[8]
A method according to claim 1, wherein the output video frame has a three to two aspect ratio.
[9]
9. Video encoding device comprising:
a memory configured to store 360 degree video data including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame; and a processor configured to:
target one picture of video from gives plurality of staff video on a upper region, an intermediate region, and a region bottom, the region higher including a first circular area gives spherical representation, the lower region including an
second circular area of the spherical representation that is opposite in the spherical representation from the first circular area, in which the intermediate region includes an area of the spherical representation not included in the upper or lower region; and map, using a projection of equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame.
[10]
10. Non-transitory computer-readable medium having stored instructions on it that, when executed by one or more processors, cause the one or more processors to perform operations including:
Petition 870190092083, of 16/09/2019, p. 79/100
Obtaining 360 degree video data including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame;
target one picture of video from gives plurality of staff video on a upper region, an intermediate region, and a region bottom, the region higher including a first circular area gives spherical representation, the lower region including an
second circular area of the spherical representation that is opposite in the spherical representation from the first circular area, in which the intermediate region includes an area of the spherical representation not included in the upper or lower region; and map, using a projection of equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame.
[11]
11. Apparatus, comprising:
means for obtaining 360 degree video data including a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame;
means for segmenting a video frame from the plurality of video frames in an upper region, an intermediate region, and a lower region, the upper region including a first circular area of the spherical representation, the lower region including a second circular area of the spherical representation that is opposite in spherical representation from
Petition 870190092083, of 16/09/2019, p. 80/100
5/8 of the first circular area, in which the intermediate region includes an area of the spherical representation not included in the upper or lower region; and means for mapping, using a projection of an equal cylindrical area, the intermediate region to one or more rectangular areas of an output video frame.
[12]
12. Method for processing video data, comprising:
obtaining 360 degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame;
identify one or more rectangular areas of a video frame from the plurality of video frames; and map, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, in which the intermediate region is located between the upper and lower regions.
[13]
13. The method of claim 12, wherein the upper region includes a spherical surface above a first spherical latitude, wherein the lower region includes a spherical surface below a second spherical latitude. , where the first latitude and the second latitude are equidistant from
Petition 870190092083, of 16/09/2019, p. 81/100
6/8
one ecuador14. spherical representation. Method, according to the claim 12 in what to one or more rectangular areas include two thirds an frame area15. Method, video.according to the claim 12 in
that mapping to one or more rectangular areas includes: selecting a point in spherical representation; determining a pixel location in the video frame that corresponds to the point, where the pixel location is determined using a mapping to convert a three-dimensional sphere to a two-dimensional rectangle;
sampling a pixel from the pixel location; and assign the sampled pixel to the point.
[14]
16. The method of claim 12, wherein the one or more rectangular areas include a left view, a front view and a right view, where the left view is located adjacent to the front view, and where the right view is adjacent to the front view.
[15]
A method according to claim 12, wherein the one or more rectangular areas include a rear view, wherein a first rectangular area including a bottom view is adjacent to the rear view, and wherein a second rectangular area including a view top is adjacent to the rear view.
[16]
18. The method of claim 12, further comprising:
map a first rectangular area of the video frame in the upper region; and map a second rectangular area of the
Petition 870190092083, of 16/09/2019, p. 82/100
7/8
video in the lower region. 19. Method, according with claim 12, in that the frame video has a aspect ratio three per two. 20. Device coding video
comprising:
a memory configured to store 360 degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame; and a processor configured to:
identify one or more rectangular areas of a video frame from the plurality of video frames; and map, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, in which the intermediate region is located between the upper and lower regions.
[17]
21. Non-transitory computer-readable medium having stored instructions on it that, when executed by one or more processors, cause the one or more processors to perform operations including:
obtain 360 degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the frame
Petition 870190092083, of 16/09/2019, p. 83/100
8/8 video;
identify one or more rectangular areas of a video frame from the plurality of video frames; and map, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, in which the intermediate region is located between the upper and lower regions.
[18]
22. Apparatus, comprising:
means for obtaining 360-degree video data including a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame;
means for identifying one or more rectangular areas of a video frame from the plurality of video frames; and means for mapping, using a projection of equal cylindrical area, to one or more rectangular areas in an intermediate region of a spherical representation of the video data, the spherical representation still including an upper region and a lower region, in which the intermediate region it is located between the upper and lower regions.

类似技术:

公开号 | 公开日 | 专利标题

BR112019019163A2|2020-04-14|sphere equator projection for efficient 360-degree video compression

US10957044B2|2021-03-23|Sphere pole projections for efficient compression of 360-degree video

US10620441B2|2020-04-14|Viewport-aware quality metric for 360-degree video

US10848761B2|2020-11-24|Reducing seam artifacts in 360-degree video

US10319071B2|2019-06-11|Truncated square pyramid geometry and frame packing structure for representing virtual reality video content

US10484682B2|2019-11-19|Reference picture derivation and motion compensation for 360-degree video coding

US10313664B2|2019-06-04|Adjusting field of view of truncated square pyramid projection for 360-degree video

WO2018175491A1|2018-09-27|Adaptive perturbed cube map projection

同族专利:

公开号 | 公开日

SG11201907264UA|2019-10-30|

TW201903710A|2019-01-16|

US10839480B2|2020-11-17|

US20180276789A1|2018-09-27|

AU2018239448A1|2019-08-29|

CN110383843A|2019-10-25|

EP3603073A1|2020-02-05|

KR20190128211A|2019-11-15|

WO2018175611A1|2018-09-27|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6466254B1|1997-05-08|2002-10-15|Be Here Corporation|Method and apparatus for electronically distributing motion panoramic images|

US6331869B1|1998-08-07|2001-12-18|Be Here Corporation|Method and apparatus for electronically distributing motion panoramic images|

KR100882011B1|2007-07-29|2009-02-04|주식회사 나노포토닉스|Methods of obtaining panoramic images using rotationally symmetric wide-angle lenses and devices thereof|

US7961980B2|2007-08-06|2011-06-14|Imay Software Co., Ltd.|Method for providing output image in either cylindrical mode or perspective mode|

US10068373B2|2014-07-01|2018-09-04|Samsung Electronics Co., Ltd.|Electronic device for providing map information|

CN107409233B|2015-03-05|2020-04-14|索尼公司|Image processing apparatus, image processing method, and program|

WO2016140082A1|2015-03-05|2016-09-09|ソニー株式会社|Image processing device and image processing method|

US9723206B1|2015-03-11|2017-08-01|Vsn Technologies, Inc.|Enabling a true surround view of a 360 panorama via a dynamic cylindrical projection of the panorama|

US20170302714A1|2016-04-15|2017-10-19|Diplloid Inc.|Methods and systems for conversion, playback and tagging and streaming of spherical images and video|

EP3451675A4|2016-04-26|2019-12-04|LG Electronics Inc. -1-|Method for transmitting 360-degree video, method for receiving 360-degree video, apparatus for transmitting 360-degree video, apparatus for receiving 360-degree video|

US10762597B2|2016-05-13|2020-09-01|Sony Corporation|Generation apparatus, generation method, reproduction apparatus, and reproduction method|

KR102208132B1|2016-05-26|2021-01-27|엘지전자 주식회사|Method for transmitting 360-degree video, method for receiving 360-degree video, apparatus for transmitting 360-degree video, and apparatus for receiving 360-degree video|

US20180054613A1|2016-08-22|2018-02-22|Mediatek Inc.|Video encoding method and apparatus with in-loop filtering process not applied to reconstructed blocks located at image content discontinuity edge and associated video decoding method and apparatus|

GB2555788A|2016-11-08|2018-05-16|Nokia Technologies Oy|An apparatus, a method and a computer program for video coding and decoding|

US20180192074A1|2017-01-03|2018-07-05|Mediatek Inc.|Video processing method for processing projection-based frame with 360-degree content represented by projection faces packed in 360-degree virtual reality projection layout|

WO2018128247A1|2017-01-03|2018-07-12|엘지전자 주식회사|Intra-prediction method and device in image coding system for 360-degree video|

US10742999B2|2017-01-06|2020-08-11|Mediatek Inc.|Methods and apparatus for signaling viewports and regions of interest|

CN108282449B|2017-01-06|2020-10-09|华为技术有限公司|Streaming media transmission method and client applied to virtual reality technology|

US10560682B2|2017-01-13|2020-02-11|Gopro, Inc.|Methods and apparatus for providing a frame packing arrangement for panoramic content|

CN108537721B|2017-03-02|2021-09-07|株式会社理光|Panoramic image processing method and device and electronic equipment|

US20180253820A1|2017-03-03|2018-09-06|Immersive Enterprises, LLC|Systems, methods, and devices for generating virtual reality content from two-dimensional images|

US10643301B2|2017-03-20|2020-05-05|Qualcomm Incorporated|Adaptive perturbed cube map projection|

US10957044B2|2017-03-22|2021-03-23|Qualcomm Incorporated|Sphere pole projections for efficient compression of 360-degree video|

US10593012B2|2017-03-22|2020-03-17|Mediatek Inc.|Method and apparatus for generating and encoding projection-based frame with 360-degree content represented in projection faces packed in segmented sphere projection layout|

US10839480B2|2017-03-22|2020-11-17|Qualcomm Incorporated|Sphere equator projection for efficient compression of 360-degree video|

US10614609B2|2017-07-19|2020-04-07|Mediatek Inc.|Method and apparatus for reduction of artifacts at discontinuous boundaries in coded virtual-reality images|

US10817980B2|2018-02-07|2020-10-27|Ricoh Company, Ltd.|Information processing apparatus, information processing system, data generation method, and recording medium storing program code|

US10764605B2|2018-02-14|2020-09-01|Qualcomm Incorporated|Intra prediction for 360-degree video|

US10779006B2|2018-02-14|2020-09-15|Qualcomm Incorporated|Signaling 360-degree video information|US10999602B2|2016-12-23|2021-05-04|Apple Inc.|Sphere projected motion estimation/compensation and mode decision|

US11259046B2|2017-02-15|2022-02-22|Apple Inc.|Processing of equirectangular object data to compensate for distortion by spherical projections|

US10924747B2|2017-02-27|2021-02-16|Apple Inc.|Video coding techniques for multi-view video|

US10839480B2|2017-03-22|2020-11-17|Qualcomm Incorporated|Sphere equator projection for efficient compression of 360-degree video|

US10506255B2|2017-04-01|2019-12-10|Intel Corporation|MV/mode prediction, ROI-based transmit, metadata capture, and format detection for 360 video|

US20180288436A1|2017-04-03|2018-10-04|Gopro, Inc.|Methods and apparatus for providing in-loop padding techniques for rotated sphere projections|

US11182639B2|2017-04-16|2021-11-23|Facebook, Inc.|Systems and methods for provisioning content|

US11093752B2|2017-06-02|2021-08-17|Apple Inc.|Object tracking in multi-view video|

US20190005709A1|2017-06-30|2019-01-03|Apple Inc.|Techniques for Correction of Visual Artifacts in Multi-View Images|

US10754242B2|2017-06-30|2020-08-25|Apple Inc.|Adaptive resolution and projection format in multi-direction video|

US11212438B2|2018-02-14|2021-12-28|Qualcomm Incorporated|Loop filter padding for 360-degree video coding|

US10922783B2|2018-03-02|2021-02-16|Mediatek Inc.|Cube-based projection method that applies different mapping functions to different square projection faces, different axes, and/or different locations of axis|

US11069026B2|2018-03-02|2021-07-20|Mediatek Inc.|Method for processing projection-based frame that includes projection faces packed in cube-based projection layout with padding|

US10715832B2|2018-03-16|2020-07-14|Mediatek Inc.|Method and apparatus of block partition for VR360 video coding|

CN110933395A|2019-12-23|2020-03-27|中科院微电子研究所昆山分所|720-degree panoramic stereo environment mapping method and related device|

CN110958444A|2019-12-23|2020-04-03|中科院微电子研究所昆山分所|720-degree view field environment situation sensing method and situation sensing system|

CN112203079A|2020-07-17|2021-01-08|中国科学院空天信息创新研究院|Three-dimensional sphere-oriented visualization system|

法律状态:
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762475103P| true| 2017-03-22|2017-03-22|

US15/926,732|US10839480B2|2017-03-22|2018-03-20|Sphere equator projection for efficient compression of 360-degree video|

PCT/US2018/023601|WO2018175611A1|2017-03-22|2018-03-21|Sphere equator projection for efficient compression of 360-degree video|

[返回顶部]